Search CORE

480 research outputs found

Communities as Well Separated Subgraphs With Cohesive Cores: Identification of Core-Periphery Structures in Link Communities

Author: B Ball
C Piccardi
E Ravasz
F Radicchi
FD Rossa
J Shi
J Yang
J Yang
JP Bagrow
M Girvan
MEJ Newman
P Csermely
R Kannan
S Fortunato
S Kojaku
SE Schaeffer
SP Borgatti
TS Evans
W Zachary
YY Ahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/10/2018
Field of study

Communities in networks are commonly considered as highly cohesive subgraphs which are well separated from the rest of the network. However, cohesion and separation often cannot be maximized at the same time, which is why a compromise is sought by some methods. When a compromise is not suitable for the problem to be solved it might be advantageous to separate the two criteria. In this paper, we explore such an approach by defining communities as well separated subgraphs which can have one or more cohesive cores surrounded by peripheries. We apply this idea to link communities and present an algorithm for constructing hierarchical core-periphery structures in link communities and first test results.Comment: 12 pages, 2 figures, submitted version of a paper accepted for the 7th International Conference on Complex Networks and Their Applications, December 11-13, 2018, Cambridge, UK; revised version at http://141.20.126.227/~qm/papers

arXiv.org e-Print Archive

Crossref

Do logarithmic proximity measures outperform plain ones in graph clustering?

Author: AD Gvishiani
CSJA Nash-Williams
DJ Klein
E Estrada
F Buckley
F Chung
F Fouss
I Kivimäki
JH Ward Jr
L Hubert
L Yen
M Girvan
MEJ Newman
P Chebotarev
P Chebotarev
P Chebotarev
P Chebotarev
P Chebotarev
PY Chebotarev
PY Chebotarev
PY Chebotarev
S Fortunato
SE Schaeffer
Publication venue
Publication date: 18/02/2017
Field of study

We consider a number of graph kernels and proximity measures including commute time kernel, regularized Laplacian kernel, heat kernel, exponential diffusion kernel (also called "communicability"), etc., and the corresponding distances as applied to clustering nodes in random graphs and several well-known datasets. The model of generating random graphs involves edge probabilities for the pairs of nodes that belong to the same class or different predefined classes of nodes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform better while distinguishing underlying classes than the "plain" measures. A comparison in terms of reject curves of inter-class and intra-class distances confirms this conclusion. A similar conclusion can be made for several well-known datasets. A possible origin of this effect is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is a tool to transform the first nature to the second one. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is usually the logarithmic Communicability measure. However, we indicate some more complicated cases in which other measures, typically, Communicability and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the Proceedings of 6th International Conference on Network Analysis, May 26-28, 2016, Nizhny Novgorod, Russi

arXiv.org e-Print Archive

Crossref

Outlier Edge Detection Using Random Graph Generation Models and Applications

Author: A Lancichinetti
AK Jain
DJ Watts
G Karypis
H Zhang
J Leskovec
J Shi
J Yang
L Akoglu
L Danon
L Danon
L Liu
L Lu
L Waltman
LC Freeman
M Choudhury De
M Coscia
M Newman
M Rosvall
ME Newman
ME Newman
MEJ Newman
MR Brito
R Yu
S Fortunato
S Lloyd
S Papadopoulos
SE Schaeffer
VD Blondel
VJ Hodge
X Dong
Publication venue
Publication date: 21/06/2016
Field of study

Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

Searching for network modules

Author: A Lancichinetti
A Lancichinetti
AE Brower
B Adamcsek
B Bollobás
E Boros
GC Rota
Hao Wu
I Gilboa
I Gilboa
J Reichardt
J Vlasblom
J Wang
Jierui Xie
Jose B. Pereira-Leal
M Aigner
M Szalay-Bekő
MC Schmidt
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
R Diestel
R Sharan
R Stanley
Randolf Rotta
S Asur
S Fortunato
S Miyamoto
S Zhang
SE Schaeffer
T Nepusz
T Yu
Tom C. Freeman
U Brandes
X Lei
Y Li
YY Ahn
Publication venue
Publication date: 07/09/2018
Field of study

When analyzing complex networks a key target is to uncover their modular structure, which means searching for a family of modules, namely node subsets spanning each a subnetwork more densely connected than the average. This work proposes a novel type of objective function for graph clustering, in the form of a multilinear polynomial whose coefficients are determined by network topology. It may be thought of as a potential function, to be maximized, taking its values on fuzzy clusterings or families of fuzzy subsets of nodes over which every node distributes a unit membership. When suitably parametrized, this potential is shown to attain its maximum when every node concentrates its all unit membership on some module. The output thus is a partition, while the original discrete optimization problem is turned into a continuous version allowing to conceive alternative search strategies. The instance of the problem being a pseudo-Boolean function assigning real-valued cluster scores to node subsets, modularity maximization is employed to exemplify a so-called quadratic form, in that the scores of singletons and pairs also fully determine the scores of larger clusters, while the resulting multilinear polynomial potential function has degree 2. After considering further quadratic instances, different from modularity and obtained by interpreting network topology in alternative manners, a greedy local-search strategy for the continuous framework is analytically compared with an existing greedy agglomerative procedure for the discrete case. Overlapping is finally discussed in terms of multiple runs, i.e. several local searches with different initializations.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Consensus clustering in complex networks

Author: A Clauset
A Clauset
A Clauset
A Condon
A Lancichinetti
A Lancichinetti
A Lancichinetti
A Lancichinetti
A Lancichinetti
A Lancichinetti
A Strehl
A Topchy
BH Good
D Watts
F Radicchi
G Palla
G Palla
H Kwak
J Hopcroft
JG White
L Danon
M Girvan
M Rosvall
M Sales-Pardo
MA Porter
MEJ Newman
MEJ Newman
MEJ Newman
MEJ Newman
P Jaccard
PJ Mucha
R Albert
S Boccaletti
S Fortunato
S Fortunato
S Kirkpatrick
SE Schaeffer
SN Dorogovtsev
UN Raghavan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.Comment: 11 pages, 12 figures. Published in Scientific Report

arXiv.org e-Print Archive

Crossref

PubMed Central

Aaltodoc Publication Archive

Structural and Algorithmic Properties of 2-Community Structures

Author: AE Feldmann
C Bazgan
C Bazgan
C Bazgan
C Bazgan
Cristina Bazgan
J Chlebikova
Janka Chlebikova
K Andreev
M Olsen
MEJ Newman
P Kristiansen
R Aharoni
S Fortunato
SE Schaeffer
Thomas Pontoizeau
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Author: A Barrat
A Fronczak
AL Barabási
E Arias-Castro
F Aleskerov
J Leskovec
L Lovász
L Ostroumova Prokhorenkova
N Alon
N Verzelen
O Goldreich
P Erdös
P Miasnikof
PJ Bickel
R Albert
S Fortunato
Santo Fortunato
SE Schaeffer
Sergiy Butenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Determining if a graph displays a clustered structure prior to subjecting it to any cluster detection technique has recently gained attention in the literature. Attempts to group graph vertices into clusters when a graph does not have a clustered structure is not only a waste of time; it will also lead to misleading conclusions. To address this problem, we introduce a novel statistical test, the-test, which is based on comparisons of local and global densities. Our goal is to assess whether a given graph meets the necessary conditions to be meaningfully summarized by clusters of vertices. We empirically explore our test’s behavior under a number of graph structures. We also compare it to other recently published tests. From a theoretical standpoint, our test is more general, versatile and transparent than recently published competing techniques. It is based on the examination of intuitive quantities, applies equally to weighted and unweighted graphs and allows comparisons across graphs. More importantly, it does not rely on any distributional assumptions, other than the universally accepted definition of a clustered graph. Empirically, our test is shown to be more responsive to graph structure than other competing tests

Crossref

Queen Mary Research Online

Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Author: C Zach
DP Bertsekas
DW Marquardt
ED Dolan
G Calafiore
K Wilson
MC Campi
MI Lourakis
MR Hestenes
P Li
PL Combettes
PR Amestoy
R Mur-Artal
S Agarwal
SE Schaeffer
TA Davis
V Rotkin
VD Blondel
Y Jeong
Y-D Jian
Publication venue
Publication date: 02/08/2020
Field of study

Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number. When the problem is scaled up, this step is neither efficient in computation nor manageable for a single compute node. In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability. It first reformulates the quadratic programming problem of an LM iteration based on the clustering of the visibility graph by introducing the equality constraints across clusters. Then, we propose to relax it into a chance constrained problem and solve it through sampled convex program. The relaxation is intended to eliminate the interdependence between clusters embodied by the constraints, so that a large RCS can be decomposed into independent linear sub-problems. Numerical experiments on unordered Internet image sets and sequential SLAM image sets, as well as distributed experiments on large-scale datasets, have demonstrated the high efficiency and scalability of the proposed approach. Codes are released at https://github.com/zlthinker/STBA.Comment: Accepted by ECCV 202

arXiv.org e-Print Archive

Crossref

An efficient record linkage scheme using graphical analysis for identifier error detection

Author: A Arasu
A McCallum
A Sarah Walker
David H Wyllie
DH Wyllie
DH Wyllie
DH Wyllie
DV Kalashnikov
EA Sauleau
I Fellegi
John M Finney
L Phillips
RA Lyons
S Chapman
S Deepayan
SE Schaeffer
T Teorey
Tim EA Peto
V Levenhstein
V Rares
WE Winkler
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Integration of information on individuals (record linkage) is a key problem in healthcare delivery, epidemiology, and "business intelligence" applications. It is now common to be required to link very large numbers of records, often containing various combinations of theoretically unique identifiers, such as NHS numbers, which are both incomplete and error-prone

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Oxford University Research Archive

S3-TM: scalable streaming short text matching

Author: A Carzaniga
A Shraer
Buğra Gedik
DM Blei
Fuat Basık
G Karypis
Hakan Ferhatosmanoğlu
K Karanasos
L Liu
M Castro
Mert Emin Kalender
PT Eugster
SE Schaeffer
T Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Micro-blogging services have become major venues for information creation, as well as channels of information dissemination. Accordingly, monitoring them for relevant information is a critical capability. This is typically achieved by registering content-based subscriptions with the micro-blogging service. Such subscriptions are long-running queries that are evaluated against the stream of posts. Given the popularity and scale of micro-blogging services like Twitter and Weibo, building a scalable infrastructure to evaluate these subscriptions is a challenge. To address this challenge, we present the S3-TM system for streaming short text matching. S3-TM is organized as a stream processing application, in the form of a data parallel flow graph designed to be run on a data center environment. It takes advantage of the structure of the publications (posts) and subscriptions to perform the matching in a scalable manner, without broadcasting publications or subscriptions to all of the matcher instances. The basic design of S

^3

3-TM uses a scoped multicast for publications and scoped anycast for subscriptions. To further improve throughput, we introduce publication routing algorithms that aim at minimizing the scope of the multicasts. First set of algorithms we develop are based on partitioning the word co-occurrence frequency graph, with the aim of routing posts that include commonly co-occurring words to a small set of matchers. While effective, these algorithms fell short in balancing the load. To address this, we develop the SALB algorithm, which provides better load balance by modeling the load more accurately using the word-to-post bipartite graph. We also develop a subscription placement algorithm, called LASP, to group together similar subscriptions, in order to minimize the subscription matching cost. Furthermore, to achieve good scalability for increasing number of nodes, we introduce techniques to handle workload skew. Finally, we introduce load shedding techniques for handling unexpected load spikes with small impact on the accuracy. Our experimental results show that S3-TM is scalable. Furthermore, the SALB algorithm provides more than 2.5× throughput compared to the baseline multicast and outperforms the graph partitioning-based approaches. © 2015, Springer-Verlag Berlin Heidelberg

Crossref

Bilkent University Institutional Repository

Warwick Research Archives Portal Repository